APEX -- The APL Parallel Executor

نویسنده

  • Robert Bernecky
چکیده

APEX: the APL Parallel Executor Robert Bernecky Master of Science Graduate Department of Computer Science University of Toronto 1997 APEX is an APL-to-SISAL compiler, generating high-performance, portable, parallel code that executes up to several hundred times faster than interpreted APL, with serial performance of kernels competitive with FORTRAN. Preliminary results indicate that acceptable multi-processor speedup is achievable. The excellent run-time performance of APEX-generated code arises from attention to all aspects of program execution: run-time syntax analysis is eliminated, setup costs are reduced, algebraic identities and phrase recognition detect special cases, some matrix products exploit a generalization of sparsematrix algebra, and loop fusion and copy optimizations eliminate many array-valued temporaries. In addition, the compiler exploits Static Single Assignment and array morphology, our generalization of data flow analysis to arrays, to generate run-time primitives that use superior algorithms and simpler storage types. Extensions to APL, including rank, cut, and a monadic operand for dyadic reduction, improve compiled and interpreted code performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approach for Proving the Correctness of Inspector/Executor Transformations

To take advantage of multicore parallelism, programmers and compilers rewrite, or transform, programs to expose loop-level parallelism. Showing the correctness, or legality, of such program transformations enables their incorporation into compilers. However, the correctness of inspector/executor strategies, which develop parallel schedules at runtime for computations with nonaffine array access...

متن کامل

Eeectiveness of Message Strip-mining for Regular and Irregular Communication

Languages such as High Performance Fortran are used to implement parallel algorithms by distributing large data structures across a multicomputer system. To hide communication behind computation, we introduce an optimization scheme, message strip-mining. By using this scheme, the communication overhead is almost completely overlapped with the subsequent computation. We have implemented the prop...

متن کامل

Parallelization Techniques for Sparse Matrix Applications

Sparse matrix problems are diicult to parallelize eeciently on distributed memory machines since data is often accessed indirectly. Inspector/executor strategies, which are typically used to parallelize loops with indirect references, incur substantial run-time preprocessing overheads when references with multiple levels of indirection are encountered | a frequent occurrence in sparse matrix al...

متن کامل

A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies

Graph partitioners play an important role in many parallel work distribution and locality optimization approaches. Surprisingly, however, to our knowledge there is no freely available parallel graph partitioner designed for execution on a shared memory multicore system. This paper presents a shared memory parallel graph partitioner, ParCubed, for use in the context of sparse tiling run-time dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997